A Linear Vertex Kernel for 

Maximum Internal Spanning Tree 



Fedor V. Fomin* Serge Gaspers^ Saket Saurabh* 
Stephan Thomasse^ 



Abstract 

We present a polynomial time algorithm that for any graph G and 
integer k > 0, either finds a spanning tree with at least k internal ver- 
tices, or outputs a new graph Gr on at most 3k vertices and an integer 
k' such that G has a spanning tree with at least k internal vertices if 
and only if Gr has a spanning tree with at least k' internal vertices. In 
other words, we show that the Maximum Internal Spanning Tree 
problem parameterized by the number of internal vertices k, has a 3k- 
vertex kernel. Our result is based on an innovative application of a 
classical min-max result about hypertrees in hypergraphs which states 
that "a hypergraph H contains a hypertree if and only if H is partition 
connected." 



1 Introduction 

In the Maximum Internal Spanning Tree problem (MIST), we are 
given a graph G and the task is to find a spanning tree of G with a maxi- 
mum number of internal vertices. MIST is a natural generalization of the 
Hamiltonian Path problem because an n-vertex graph has a Hamiltonian 
path if and only if it has a spanning tree with n — 2 internal vertices. 

In this paper we study a parameterized version of MIST. Parameterized 
decision problems are defined by specifying the input (I), the parameter 
(k), and the question to be answered. A parameterized problem that can 
be solved in time f (k)\I\ olyl \ where / is a function of k alone is said to 
be fixed parameter tractable (FPT). The natural parameter k for MIST is 
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the number of internal vertices in the spanning tree and the parameterized 
version of MIST, p-Internal Spanning Tree or p-IST for short, is for a 
given graph G and integer k, decide if G contains a spanning tree with at least 
k internal vertices. It follows from Robertson and Seymour's Graph Minors 
theory that p-IST is FPT [10]. Indeed, the property of not having a spanning 
tree with at least k internal vertices is closed under taking minors, and thus 
such graphs can be characterized by a finite set of forbidden minors. One of 
the consequences of the Graph Minors theory is that every graph property 
characterized by a finite set of forbidden minors is FPT, and thus p-IST is 
FPT. These arguments are however not constructive. The first constructive 
algorithm for p-IST is due to Prieto and Sloper [12] and has running time 
2 4fclogfc • n ^\ Recently this result was improved by Cohen et al. [2] who 
solved a more general directed version of the problem in time 49.4 fc • n°^\ 
In this paper we study p-IST from the kernelization viewpoint. 

A parameterized problem is said to admit a polynomial kernel if there is 
a polynomial time algorithm (where the degree of the polynomial is indepen- 
dent of k), called a kernelization algorithm, that reduces the input instance 
to an instance whose size is bounded by a polynomial p(k) in k, while pre- 
serving the answer. This reduced instance is called a p{k) kernel for the 
problem. Let us remark, that the instance size and the number of vertices in 
the instance may be different, and thus for bounding the number of vertices 
in the reduced graph, the term p(k) -vertex kernel is often used. While many 
problems on graphs are known to have polynomial kernels (parameterized by 
the solution size), there are not so many 0(k), or linear- vertex kernels known 
in the literature. Notable examples include a 2&-vertex kernel for Vertex 
Cover [3], a A;- vertex kernel for Set Splitting [6], and a 6k- vertex kernel 
for Cluster Editing [5]. 

No linear-vertex kernel for p-IST was known prior to our work. Prieto 
and Sloper [TT] provided an 0(fc 3 )-kernel for the problem and then improved 
it to 0(k 2 ) in [12]. The main result of this paper is that p-IST has a 
3fc-vertex kernel. The kernelization of Prieto and Sloper is based on the so- 
called "Crown Decomposition Method" [I]. Here, we use a different method, 
based on a min-max characterization of hypergraphs containing hypertrees 
by Frank et al. [4]. As a corollary of the new kernelization, we obtain an 
algorithm for solving p-IST running in time 8 fc • n°^\ 

The paper is organized as follows. In Section [21 we provide necessary 
definitions and facts about graphs and hypergraphs. In Section OH we give 
the kernelization algorithm. Section [4] is devoted to the proof of the main 
combinatorial lemma, which is central to the correctness of the kernelization 
algorithm. 
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2 Preliminaries 



2.1 Graphs 

Let G = (V, E) be an undirected simple graph with vertex set V and edge 
set E. For any nonempty subset W C V, the subgraph of G induced by W 
is denoted by G[W]. The neighborhood of a vertex v in G is Nq(v) = {u G 
V : {u, u} € £"}, and for a vertex set S C V we set Ng{S) = [j v£ s N(v) \ S. 
The degree of vertex u in G is dc(v) = |iV(i>)|. Sometimes, when the graph 
is clear from the context, we omit the subscripts. 

2.2 The Hypergraphic Matroid 

Let H = (V, E) be a hypergraph. A hyperedge e S E is a subset of V. A 
subset F of edges is a hyperforest if | UF'| > \F'\ + 1 for every subset F' of F, 
where UF' denotes the union of vertices contained in the hyperedges of F' . 
This condition is also called the strong Hall condition, where strong stands 
for the extra plus one added to the usual Hall condition. A hyperforest 
with \V\ — 1 edges is called a hypertree. Lorea proved (see [4] or [7]) that 
M.h = (E,F), where T consists of the hyperforests of H, is a matroid, 
called the hypergraphic matroid. Observe that these definitions are well- 
known when restricted to graphs. 

Lovasz proved (see [8]) that F is a hyperforest if and only if every hy- 
peredge e of F can be shrunk into an edge e' (that is, e' C e contains two 
vertices of e) in such a way that the set F' consisting of these contracted 
edges forms a forest in the usual sense, that is, forest of a graph. Observe 
that if F is a hypertree then its set of contracted edges F' forms a spanning 
tree on V. 

The border of a partition V = {Vi, . . . ,V P } of V is the set 5(V) of hy- 
peredges of H which intersect at least two parts of V. A hypergraph is 
partition- connected when ^(T 7 )] > \V\ — 1 for every partition V of V. The 
following theorem can be found in jH Corollary 2.6]. 

Theorem 1. H contains a hypertree if and only if H is partition- connected. 

The proof of Theorem [T] can be turned into a polynomial time algorithm, 
that is, given a hypergraph H = (V, E) we can either find a hypertree or 
find a partition V of V such that ^(T 7 )! < \V\ — 1 in polynomial time. For 
the sake of completeness, we briefly mention a polynomial time algorithm 
to do this, though the running time may be easily improved. Recall that 
M.H = (E, JF), where T consists of the hyperforests of H, is a matroid and 
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hence we can construct a hypertree, if one exists, greedily. We start with an 
empty forest and iteratively try to grow our current hyperforest by adding 
new edges. When inspecting a new edge we either reject or accept it in 
our current hyperforest depending on whether by adding it we still have a 
hyperforest. The only question is to be able to test efficiently if a given 
collection of edges forms a hyperforest. In other words, we have to check 
if the strong Hall condition holds. This can be done in polynomial time by 
simply running the well-known polynomial time algorithm for testing the 
usual Hall condition for every subhypergraph H\v, where v is a vertex and 
H \ v is the hypergraph containing all hyperedges e \ v for e € E. 

We can also find a contraction of the edges of a hypertree into a spanning 
tree in polynomial time. For this, consider any edge e of the hypertree with 
more than two vertices (if none exist, we already have our tree). By a result 
of Lovasz [8] mentioned above, one of the vertices v £ e can be deleted from e 
in such a way that we still have a hypertree. Hence we just find this vertex by 
checking the strong Hall condition for every choice of e\v where v € e. This 
implies that we need to apply the algorithm to test the strong Hall condition 
at most \V\ times to obtain the desired spanning tree. Consequently, there 
exists a polynomial time algorithm which can find a contracted spanning 
tree out of a partition-connected hypergraph. 

We now turn to the co-NP certificate, that is, we want to exhibit a par- 
tition V of V such that |<5('P)| < \V\ — 1 when H is not partition-connected. 
The algorithm simply tries to contract every pair of vertices in H = (V, E) 
and checks if the resulting hypergraph is partition-connected. When it is 
not, we contract the two vertices, and recurse. We stop when the resulting 
hypergraph H' is not partition-connected, and every contraction results in a 
partition-connected hypergraph. Observe then that if a partition V of H' is 
such that ^(T 3 )! < \T\ — 1 and V has a part which is not a singleton, then 
contracting two vertices of this part results in a non partition-connected hy- 
pergraph. Hence, the singleton partition is the unique partition V of H 1 such 
that ^(T 3 )! < \V\ — 1. This singleton partition corresponds to the partition 
of H which gives our co-NP certificate. 

3 Kernelization Algorithm 

Let G = (V, E) be a connected graph on n vertices and k € N be a parameter. 
In this section we describe an algorithm that takes G and k as an input, and 
in time polynomial in the size of G either solves p-IST, or produces a reduced 
graph Gr on at most 3 A; vertices and an integer k' < k, such that G has 
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a spanning tree with at least k internal vertices if and only if Gr has a 
spanning tree with at least k! internal vertices. In other words, we show that 
p-IST has a 3fc-vertex kernel. 

The algorithm is based on the following combinatorial lemma, which is 
interesting on its own. For two disjoint sets X, Y C V, we denote by B{X, Y) 
the bipartite graph obtained from G[Xuy] by removing all edges with both 
endpoints in X or Y. 

Lemma 2. Ifn> 3, and I is an independent set of G of cardinality at least 
2n/3, then there are nonempty subsets S C.V \ I and LCI such that 

(i) N(L) = S, and 

(ii) B(S,L) has a spanning tree such that all vertices of S and \S\ — 1 
vertices of L are internal. 

Moreover, given a graph on at least 3 vertices and an independent set of 
cardinality at least 2n/3, such subsets can be found in time polynomial in the 
size of G. 

The proof of Lemma [2] is postponed to Section [H Now we give the 
description of the kernelization algorithm and use Lemma [2] to prove its 
correctness. The algorithm consists of the following reduction rules. 

Rule 1 If re < 3k, then output graph G and stop. In this case G is a 
3k- vertex kernel. Otherwise proceed with Rule 2. 

Rule 2 Choose an arbitrary vertex v £ V and run a DFS (depth first search) 
from v. If the DFS tree T has at least k internal vertices, then the 
algorithm has found a solution and stops. Otherwise, because n > 3k, 
T has at least 2n/3 + 2 leaves, and since all leaves but the root of 
the DFS tree are pairwise nonadjacent, the algorithm has found an 
independent set of G of cardinality at least 2n/3. Proceed with Rule 3. 

Rule 3 (reduction) Find nonempty subsets of vertices S, L C F as in 
Lemma [21 Add a vertex vs and make it adjacent to every vertex in 
N(S) \ L and add a vertex vl and make it adjacent to vs. Finally, 
remove all vertices of S U L. Let Gr = (Vr, Er) be the new graph and 
k' = k- 2\S\ + 2. Go to Rule 1 with G := G R and k := k'. 

To prove the soundness of Rule 3, we need the following lemma. Here, 
S and L are as in Lemma [2l If T is a tree and X a vertex set, we denote by 
\t{X) the number of vertices of X that are internal in T. 
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Lemma 3. // G has a spanning tree with k internal vertices, then G has a 
spanning tree with at least k internal vertices in which all the vertices of S 
and exactly \S\ — 1 vertices of L are internal. 

Proof. Let T be a spanning tree of G with k internal vertices. Denote by F 
the forest obtained from T by removing all edges incident to L. Then, as 
long as 2 vertices of S are in the same connected component in F, remove an 
edge from F incident to one of these two vertices. Now, obtain the spanning 
tree T' by adding the edges of a spanning tree of B(S, L) to F in which all 
vertices of S and \S\ — 1 vertices of L are internal (see Lemma [2]). Clearly, 
all vertices of S and \S\ — 1 vertices of L are internal in T' . It remains to 
show that T' has at least as many internal vertices as T. 

Let U := V \ (S U L). Then, we have that \ T (L) < Y, u eL d T(u) - \L\ as 
every vertex in a tree has degree at least 1 and internal vertices have degree 
at least 2. We also have \ T '{U) > \ T (U) - (\L\ + |5| - 1 - Yju^l d r(u)) as at 
most \S\ — 1 — Euei^ T ( n ) — 1-^1) edges incident to S are removed from F 
to separate F \ L into |iS| connected components, one for each vertex of S. 
Thus, 

\ T ,(V) = \ T <(U) + \ T/ (SUL) 

> \ T (U) - (\L\ + \S\-1~Y^ M u )) + '^'(S U L) 

ueL 

= \ T (U) + d T( u ) - \ L \) -\s\ + i + \ T '(S U L) 

> \ T (U) + \ T (L)-\S\ + 1 + \ T >(S\JL) 

= \ T (U) + \ T (L) - (\S\ - 1) + (\S\ + \S\ - 1) 
= \ T (U) + ] T (L) + \S\ 

> \t(U) + \ t (L) + \ t (S) 
= \t(V). 

This finishes the proof of the lemma. □ 
Lemma 4. Rule 3 is sound, \Vr\ < \V\, and k' < k. 

Proof. We claim first that the resulting graph Gr = (Vr, Er) has a spanning 
tree with at least k! = k — 2151 + 2 internal vertices if and only if the original 
graph G has a spanning tree with at least k internal vertices. Indeed, assume 
G has a spanning tree with £ > k internal vertices. Then, let B(S, L) be as in 
Lemma [2] and T be a spanning tree of G with i internal vertices such that all 
vertices of S and \S\ — 1 vertices of L are internal (which exists by Lemma [3]). 
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Because T[S U L] is connected, every two distinct vertices u, v G Nj>(S) \ L 
are in different connected components of T \ (L U S). But this means that 
the graph T' obtained from T\(LU S) by connecting vs to all neighbors of 
S in T\ (SUL) is also a tree in which the degree of every vertex in Ng(S)\L 
is unchanged. The graph T" obtained from T' by adding vl to v$ is also a 
tree. Then T" has exactly i — 2|5| + 2 internal vertices. 

In the opposite direction, if Gr has a tree T" with I — 1\S\ + 2 internal 
vertices, then all neighbors of t>s in T" are in different components of T" \ 
{vs}. By Lemma we know that B(S,L) has a spanning tree T$l such 
that all the vertices of S and \S\ — 1 vertices of L are internal. We obtain 
a spanning tree T of G by considering the forest T* = T" \ {vs,vl} U T$l 
and adding edges between different components to make it connected. For 
each vertex u G Nt"(vs) \ {vl}-, add an edge uv to T* , where uv is an edge 
of G and v G S. By construction we know that such an edge always exists. 
Moreover, the degrees of the vertices in Ng{S) \ L are the same in T as in 
T" . Thus T is a spanning tree with t internal vertices. 

Finally, as |5| > 1 and \L U 5| > 3, we have that \Vr\ < \V\ and 
k' <k. □ 

Thus Rule 3 compresses the graph and we conclude with the following 
theorem. 

Theorem 5. p-IST has a 2>k-vertex kernel. 
Corollary 6. p-IST can be solved in time 8 k ■ n olyl \ 

Proof. Obtain a 3/c-vertex kernel for the input graph G in polynomial time 
using Theorem [5] and run the 2 n n°^ time algorithm of Nederlof [9] on the 
kernel. □ 



4 Proof of Lemma [2] 

In this section we provide the postponed proof of Lemma [2l Let G = (V, E) 
be a connected graph on n vertices, I be an independent set of G of cardi- 
nality at least 2n/3 and C := V \ I. 

Let Y be a subset of V. A subset X C (V \ Y) has Y -expansion c, for 
some c > 0, if for each subset Z of X, \N(Z) D Y\ > c ■ \Z\. We first find an 
independent set L C / whose neighborhood has L-expansion 2. For this, we 
need the following result. 

Lemma 7 (|13j). Let B be a nonempty bipartite graph with vertex bipartition 
(X, Y) with \Y\ > 2\X\ and such that every vertex of Y has at least one 
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neighbor in X . Then there exist nonempty subsets I'CI and 7'C7 such 
that the set of neighbors ofY' in B is exactly X' , and such that X' has Y'- 
expansion 2. Moreover, such subsets X ,Y can be found in time polynomial 
in the size of B. 

By using Lemma [7j we find nonempty sets of vertices S'CC and L' C. I 
such that N(L') = S' and 5' has L'-expansion 2. 

Lemma 8. Let G = (V, E) be a connected graph on n vertices, I be an 
independent set ofG of cardinality at least 2n/3 and C := V\I. Furthermore 
let S' C C and L' C I such that N(L') = S' and S' has L'-expansion 2. Then 
there exist nonempty subsets SC5' and L C L' such that 

• B(S,L) has a spanning tree in which all the vertices of L have degree 
at most 2, 

• S has L-expansion 2, and 

• N(L) = S. 

Moreover, such sets S and L can be found in time polynomial in the size of 
G. 

Proof. The proof is by induction on \S'\. If \S'\ = 1, the lemma holds with 
S := S' and L := L' . Let H = (S' , E') be the hypergraph with edge set 
E' = {N(v) | v € L'}. If H contains a hypertree, then it has — 1 
hyperedges and we can obtain a tree Ts> on S' by contracting edges. We 
use this to find a subtree T' of B(S',L') spanning S' as follows: for every 
edge e = uv of T$' there exists a hyperedge corresponding to it and hence a 
unique vertex, say w, in L'; we delete the edge e = uv from Tg> and add the 
edges wu and wv to Tgi. Observe that the resulting subtree T' of B(S', L') 
has the property that every vertex in T' which is in L' has degree 2 in it. 
Finally, we extend T' to a spanning tree of B(S', L') by adding the remaining 
vertices of L' as pending vertices. All this can be done in polynomial time 
using the algorithm in Section 12.21 Thus S' and L' are the sets of vertices 
we are looking for. Otherwise, if H does not contain a hypertree, then 
H is not partition-connected by Theorem [Q Then we can find a partition 
V = {-Pi, P2, ■ ■ ■ , Pi} of <S" such that its border 5(V) contains at most t — 2 
hyperedges of H in polynomial time. Let bi be the number of hyperedges 
completely contained in Pj, where 1 < i < t. Then there is j, 1 < j < £, such 
that bj > 2\Pj\. Indeed, otherwise \L'\ = (i - 2) + ELi( 2 l P il ~ l ) < 2 I 5 "I> 
which contradicts the choice of L' and S' and the fact that S' has an L'- 
expansion 2. Let X := Pj and Y := {w G L'\ N(w) C Pj}. We know that 



\Y\ > 2\X\ and hence by Lemma [7] there exists a S* C X and L* C Y 
such that 5* has L*-expansion 2 and N(L*) = S* . Thus, by the induction 
assumption, there exist 5C5* and L C L* with the desired properties. □ 

Let 5 and L, be as in Lemma El We will prove in the following that 
there exists a spanning tree of B(S,L) such that all the vertices of S and 
exactly \S\ — 1 vertices of L are internal. Note that there cannot be more 
than 2|5| — 1 internal vertices in a spanning tree of B(S, L) without creating 
cycles. By Lemma El we know that there exists a spanning tree of B(S,L) 
in which |<S| — 1 vertices of L have degree exactly 2. 

Consider the bipartite graph B2 obtained from B(S, L) by adding a copy 
S c of S (each vertex in S has the same neighborhood as its copy in S c and no 
vertex of S c is adjacent to a vertex in S). As \L\ > \S U S c \ and each subset 
Z of S U S c has at least \Z\ neighbors in L, by Hall's theorem, there exists a 
matching in B2 saturating S U S c . This means, that in B(S,L), there exist 
two edge-disjoint matchings M\ and M2, both saturating S. We refer to the 
edges from Mi U M2 as the favorite edges. 

Lemma 9. B(S,L) has a spanning tree T such that all the vertices of S and 
\S\ — 1 vertices of L are internal in T. 

Proof. Let T be a spanning tree of B(S,L) in which all vertices of L have 
degree at most 2, obtained using Lemma El As T is a tree, exactly |5| — 1 
vertices of L have degree 2 in T. As long as a vertex v G S is not internal in 
T, add a favorite edge uv to T which was not yet in T (u G L), and remove 
an appropriate edge from the tree which is incident to u so that T remains a 
spanning tree. Vertex v becomes internal and the degree of u in T remains 
unchanged. As u is only incident to one favorite edge, this rule increases 
the number of favorite edges in T even though it is possible that some other 
vertex in S would have become a leaf. We apply this rule until no longer 
possible. We know that this rule can only be applied at most \S\ times. In 
the end, all the vertices of S are internal and \S\ — 1 vertices among L are 
internal as their degrees remain the same. □ 

To conclude with the proof of Lemma O we observe that S C C, L C I 
and N(L) = S by the construction of S and L, and by Lemma (H B(S,L) 
has a spanning tree in which all the vertices of S and 151 — 1 vertices of L 
are internal. 
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